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Abstract 

Background: Quantitative trait loci (QTL) detection on a huge amount of phenotypes, like eQTL detection on 
transcriptomic data, can be dramatically impaired by the statistical properties of interval mapping methods. One of 
these major outcomes is the high number of QTL detected at marker locations. The present study aims at 
identifying and specifying the sources of this bias, in particular in the case of analysis of data issued from outbred 
populations. Analytical developments were carried out in a backcross situation in order to specify the bias and to 
propose an algorithm to control it. The outbred population context was studied through simulated data sets in a 
wide range of situations. 

The likelihood ratio test was firstly analyzed under the "one QTL" hypothesis in a backcross population. Designs of sib 
families were then simulated and analyzed using the QTL Map software. On the basis of the theoretical results in 
backcross, parameters such as the population size, the density of the genetic map, the QTL effect and the true location 
of the QTL, were taken into account under the "no QTL" and the "one QTL" hypotheses. A combination of two non 
parametric tests - the Kolmogorov-Smirnov test and the Mann-Whitney-Wilcoxon test - was used in order to identify 
the parameters that affected the bias and to specify how much they influenced the estimation of QTL location. 

Results: A theoretical expression of the bias of the estimated QTL location was obtained for a backcross type 
population. We demonstrated a common source of bias under the "no QTL" and the "one QTL" hypotheses and 
qualified the possible influence of several parameters. Simulation studies confirmed that the bias exists in outbred 
populations under both the hypotheses of "no QTL" and "one QTL" on a linkage group. The QTL location was 
systematically closer to marker locations than expected, particularly in the case of low QTL effect, small population size 
or low density of markers, i.e. designs with low power. Practical recommendations for experimental designs for QTL 
detection in outbred populations are given on the basis of this bias quantification. Furthermore, an original algorithm is 
proposed to adjust the location of a QTL, obtained with interval mapping, which co located with a marker. 

Conclusions: Therefore, one should be attentive when one QTL is mapped at the location of one marker, 
especially under low power conditions. 
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Background 

For the last decade, several studies have shown that a 
large proportion of QTL are mapped at the markers 
locations whenever linkage analysis is applied. As to 
what regards dataset analyses, this bias first raised 
doubts in Spelman et al. [1], who observed a large 
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proportion of significant test statistics at marker location 
when looking for QTL in five milk production traits. 
Walling et al. [2] have described the influence of mar- 
kers in constructing the confidence intervals of QTL 
location and questioned whether QTL location was 
biased towards the location of markers instead of its 
true position. By applying the regression coefficients on 
the markers as suggested by Whittaker et al. [3], Wall- 
ing et al. [4] calculated the proportion of putative QTL 
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located at marker positions in a backcross population. 
They have reported a systematic bias for the estimated 
QTL position under the null hypothesis of the test, i.e. 
the hypothesis of no QTL on the linkage group. More- 
over, results from linear regression methods for QTL 
detection have been reported to behave the same way 
the results from maximum likelihood methods in inter- 
val mapping approaches do [5]. The simulation studies 
by Walling et al. [4] have confirmed that these two 
approaches have similar biases on the estimated QTL 
position in a backcross population. 

These previous works have shown that a bias on the 
QTL location occurs when genetic linkage analysis for 
QTL mapping is used in a backcross population. How- 
ever, little research has been devoted to establishing 
which parameters give rise to that bias. What is more, no 
study has investigated how it affects linkage analysis 
applied to outbred populations. In order to address these 
shortcomings, the present study aims at identifying the 
sources of that bias, in particular regarding the analysis 
of data issued from outbred populations. 

This question is of critical importance in expression 
quantitative trait loci (eQTL) mapping. Indeed, a main 
objective is often to search for eQTL which co localize 
with QTL which influences agronomical performances. 
The accuracy of the eQTL locations is thus a fundamen- 
tal element for experimental design optimization, espe- 
cially since experimental designs for gene expression 
analyses are generally of moderate size due to the cost of 
phenotyping. The pioneer work of eQTL detection can 
be traced back to the emergence of the concept of geneti- 
cal genomics [6]. During the past decade, QTL mapping 
was widely applied to the detection of eQTL, for example 
in yeast [7], mice [8], human [9,10], maize [11] and pig 
[12]. Generally, mapping procedures were used to map 
eQTL considering each transcript expression level as one 
quantitative trait in a trait by trait analysis. 

Recently, we have carried out linkage analyses by inter- 
val mapping on high throughput transcriptomic data 
from several familial QTL detection designs, in pig [13], 
in poultry [14] and in trout [15]. Because of the high 
dimensionality of the phenotypes, these eQTL analyses 



have highlighted to the bias of interval mapping estima- 
tion of the QTL location. We observed that the number 
of eQTL detected at marker locations was consistently 
higher than between marker locations: for instance, with 
the analysis of 6 665 gene expressions in a population of 
325 pigs, we found 756 eQTL on the chromosome 18 
distributed as shown in Figure 1. It appeared that the 
eQTL were significantly more often mapped on marker 
locations rather than between marker locations. 

Hence, in order to qualify this possible bias on the esti- 
mated QTL location in outbred populations and to spe- 
cify which parameters influence it, this paper presents a 
study of the QTL location accuracy. Firstly, in order to 
make things more concrete, we explored the empirical 
distribution of the LRT along the linkage group under 
the null hypothesis of "no QTL" on a real dataset. Sec- 
ondly, analytical developments were carried out so as to 
identify the parameters which influence the QTL location 
accuracy. Since they are impossible to realize for outbred 
populations, because of the test statistic complexity, a 
more simple case of a backcross type population, i.e. a 
backcross between inbred lines, was considered at that 
stage. Thirdly, designs of outbred sib families were simu- 
lated and analyzed in order to characterize the bias varia- 
bility under the null and the alternative ("one QTL 
segregating on the linkage group") hypotheses. Such 
parameters as population size and marker density under 
the null hypothesis, as well as QTL effect and simulated 
QTL location under the alternative hypothesis were 
taken into account. Finally, an approach as to how to 
adjust the QTL location estimation, for a QTL located at 
the position of one marker, was suggested. 

Results 

The LRT distribution along the linkage group under the 
null hypothesis 

By using the real pedigree and genotypes structure from 
an experimental design in pig, 2 000 simulations of phe- 
notypes under the null hypothesis of "no QTL on the 
linkage group" were performed. The distribution of the 
estimated QTL location, i.e. the location of the maxi- 
mum LRT, was obtained (Figure 2) on the chromosome 
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Figure 1 eQTL mapping in pigs: example of distribution of the eQTL locations (chromosome 18). 756 eQTL were mapped on the 
chromosome 18 which have 5 microsatellite markers located at 0 M, 0.08 M, 0.39 M, 0.54 M and 0.83 M, respectively (black points on the X axis). 
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Figure 2 QTL mapping in pigs: empirical distribution of the estimated QTL location under HO (chromosome 1) Results are based on 
2000 simulations in a population of 4 sires and 325 offspring. There are 16 markers on the chromosome 1 (black points in the X axis). 



SSC1 which carried 16 microsatellite markers (black 
points in the X axis). It clearly shows that a large pro- 
portion of QTL was found at a marker location. The 
histograms in Figure 3 show the empirical distributions 
of the 2000 LRT at some locations on the chromosome, 
both markers and non-markers. The LRT at each 



location followed a x 2 distribution, with degrees of free- 
dom ranging from 4.05 to 4.55, according to a Kolmo- 
gorov-Smirnov (KS) test at a = 0.01. This was 
generalized to all tested positions on the linkage group 
in Figure 4, where the distributions of the LRT charac- 
terized by the number of degrees of freedom under the 
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Figure 3 QTL mapping in pigs: empirical distribution of the LRT at various locations Results are based on 2000 simulations in a 
population of 4 sires and 325 offspring. The line represents the density obtained with a Kernel method, the dotted line represents the density of 
a x 2 with 4 d.f. 
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Figure 4 QTL mapping in pigs: empirical degrees of freedom of the % 2 distribution M indicates the locations of the markers. 



null hypothesis at markers was not very different from 
the distributions obtained for positions between 
markers. 

It should be noted (Figure 2) that the proportion of 
QTL locations estimated at the two extreme marker posi- 
tions of the linkage group was higher than for the other 
markers under the "no QTL" hypothesis. The asymptotic 
distribution of the LRT process at marker positions is 
known to be the square of an Ornstein-Uhlenbeck (OU) 
process as pointed out by Lander and Botstein [16] and 
proved by Cierco [17]. As indicated in Rabier et al. [18], 
when test statistics are performed only on markers, the 
OU process follows an autoregressive process of order 1. 
After 1 million of simulations for this process, we found 
that the probability for the maximum of the OU process 
to be on the bounds is higher than within the interval 
(Figure 5). This property of the OU process was consis- 
tent with the fact that we observed a large proportion of 
QTL localised at the extreme marker positions in com- 
parison with the other markers. 

The QTL location bias expression in a backcross population 

In order to investigate the bias on the QTL location under 
the hypothesis of "one QTL", we considered a linkage 
group limited to an interval [0,1] between two markers 



Mi (alleles Mi and m x ) at 0 and M 2 (alleles M 2 and m 2 ) at 
T flanking a QTL (alleles Q and q) in a backcross popula- 
tion obtained from the cross M-LM-IQQM2M2 x 
M 1 m 1 QqM 2 m 2 - 

Let be the phenotypic value for the individual k = 
1, Assuming a QTL located at the location t 0 , the 
genetic model for y k is: 



Yk = [i- + -gfc(fo) + e k , 



(1) 



where ^ denotes the overall mean, a denotes the QTL 
allelic substitution effect, gi£t 0 ) is the genotypic value of 
k at the QTL position, which takes 1 or -1 value 
depending on the QTL allele, Q or q respectively, 
received by k from its heterozygous parent and is a 
random normal variable with mean 0 and variance o . 
In order to simplify calculations, we set fj. = 0 and a = 
1 and used a linearized likelihood function instead of a 
mixture of two normal distributions (e.g. [19]). In this 
case, the interval mapping method is the same as the 
regression method for QTL detection, and the model (1) 
is modified as follows: 



Yk = -Xk{t 0 ) + ek, 



(2) 
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Figure 5 Empirical distribution of the maximum of the Ornstein-Uhlenbeck process. Results are based on 1 million of simulations. 
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where X/Xto) = E[g k {t 0 )\M k ] and M k denotes the geno- 
type of the individual k at markers M 1 and M 2 . Let d t .f 
be the recombination rate in the distance \t - t'\, then 
for all k (see Appendix I): 



(Ot- 

& + e T -, 

(Ot- 



- Ot) if M k = M\MiM 2 M 2 
- e t )/e T if M k = M 1 M 1 M 2 m 2 
1)/(1-0 T ) if M k = M 1 m 1 M 2 m 2 
-e t )/6 T \i M l! = M x miM 2 M 2 



and the LRT at the position t is calculated as 



LRT{t) 



(3) 



Note that given a position £ (xk(t)}k = l, .... « is a 
sequence of random variables with the same 

expected value 0 and the same variance (see Appendix 
III), noted Var(x(t)). Hence, if we replace by the 
model (2) in equation (3), then the LRT(t) consists of 
three terms as (see Appendix II): 

LRT{t) =/(t)+ei(t) + £ 2 (t)/ 

where f(t) = £a 2 E*frfo)*kOO] 2 /X! *^(0 is a func_ 
tion of variable t, the noise ei(f) is the LRT at t under 
the no QTL hypothesis and the noise £ 2 {t) follows 
approximately a normal distribution with mean 0 and 
variance na 2 Var{x{t))p} t Note that: 



are max LRT(t) = are max —LRT(t), 

te[0,T] te[0,T] Tl 

i.e. the bias on LRT(t) behaves similarly to the bias on 
LRT(i)ln. So this property allows to analyze the source 
of the bias in LRT(t)ln instead of LRT(t). Using the law 
of large numbers, we have this following decomposition 
of LRT (t)/n: 



1 



1 



1 



LRT(t) « -a 2 Var{x{t 0 ))pl + -£i(t) + -e 2 {t), (4) 



It can be seen from formula (4) that the first term 
reaches its maximum at t 0 since p tto — > 1 when t— > t 0 . 
As seen in the previous section, the second term, which 
is proportional to the LRT at t under the "no QTL" 
hypothesis, reaches its maximum at the position of mar- 
kers more often than at the positions between markers. 
As a result, the estimated QTL location will be biased 
towards the position of markers. However, when n or a 
increase, or when T decreases, or when t 0 approaches 
one marker location (see Appendix III), the deviation 
between the two first terms in formula (4) increases and 
the influence of the second term is reduced. Therefore, 
in our simple backcross population model, under the 



hypothesis of one QTL, when the population size, mar- 
ker density, QTL effect increase or when the true QTL 
location approaches the position of a marker, the bias of 
the estimated QTL location is expected to be reduced. 

Simulations under HO 

According to the preceding results, the estimated QTL 
location cannot be expected to be uniformly distributed 
on the chromosome under the null hypothesis of no 
QTL. Familial designs were simulated to test the influ- 
ence of the population size and the marker density on 
this bias in outbred populations. 
Impact of the population size 

Six different population sizes, from 60 to 800 progeny, 
were simulated with three markers located at 0 M, 0.2 
M and 0.4 M on a 0.4 M linkage group. The empirical 
distributions of the estimated QTL location, obtained 
from 5 000 simulations for each population size, showed 
that the probability of mapping a QTL at a marker loca- 
tion was always higher than that of mapping it between 
markers (Figure 6). As shown in Table 1, the proportion 
of QTL which co-localized with the markers looked 
independent from the population size. When comparing 
the estimated QTL location distributions, which were 
obtained with the different population sizes, using the 
Kolmogorov-Smirnov test, large p-values were obtained 
(> 0.97). This indicates that the population size did not 
influence the distribution of the estimated QTL location 
in a significative way under the null hypothesis. 
Impact of the marker density 

The empirical distributions of estimated QTL locations 
were compared for five marker, with 2, 3, 5, 7 and 11 
markers equally distributed on a 0.6 M linkage group 
(Figure 7). The bias towards the marker positions was 
systematic whichever the marker density. Except for 11 
markers, the number of markers had little influence on 
the proportion of QTL located at a marker position 
(Table 2), i.e. the bias seemed to divide up between 
markers. However, this criteria was difficult to interpret 
because the number of markers was different in each of 
the cases studied. Moreover, in all cases, the two 
extreme markers concentrated more false locations than 
intermediate markers did. Since the number of markers 
was different, the distributions based on the marker 
density were expected to be different as well. The KS 
test was thus not suitable to test the influence of the 
marker density. 

Simulations under H1 

According to the analytical results obtained in a back- 
cross type population, the population size and the mar- 
ker density, as well as the QTL effect and location, were 
parameters which were very likely to influence the bias 
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Figure 6 Empirical distribution of the estimated QTL location according to the population size under HO Num. Offsp indicates the 
number of offspring in the outbred population. Results are based on 5000 simulations per case. There were three markers at 0 M, 0.2 M and 0.4 
M. The line represents the density obtained with a Kernel method. 



on the estimated QTL location under the alternative 
hypothesis. 

impact of the population size 

Three sizes of population were simulated: 100, 300 or 
800 progeny. The empirical distributions of the esti- 
mated QTL location are given in Figure 8. One QTL 
was simulated at the 0.1 M location (A on the X axis) 
on a 0.4 M linkage group with three markers located at 
0 M, 0.2 M and 0.4 M. The Figure 8 clearly shows a 
variability of the bias of the QTL location depending on 
the population size. It can be seen from Table 3 that the 
proportion of putative QTL that co-localized with a 
marker and the root mean square error (RMSE) of the 



QTL location became smaller when the population size 
increased. Table 3 also shows that the bias was corre- 
lated to the power of the analysis. However, considering 
only very significant LRT (a = 0.01), significant (a = 
0.05) or all LRT {a = -), led to very similar values of 
RMSE or of proportion of QTL mapped on a marker 
location. The KS test indicated that the distributions of 
the estimated QTL position were significantly different 
depending on the size of the population studied (p - 
values = 0). Moreover, the Mann-Whitney- Wilcoxon 
(MWW) test showed that, when the population size 
increased, the median of the error of the estimated QTL 
position significantly decreased (p - values < 2.2e - 16). 



Table 1 Proportion of estimated QTL locations at marker locations according to the population size under HO 







Number of individuals (sxdx p) 1 






60 (3 x 1 x 20) 


100 (5 x 1 x 20) 300 (5 x 2 x 30) 400 (5 x 2 x 40) 


800 (5 x 4 X 40) 


Proportion 2 {%) 


64.2 


63.8 65.3 67.1 


66.0 



1 The population structure was a mixture of full and half sib families for given numbers of sires (5), dams per sire [d), and progeny per dam (p). 
Proportion of QTL located at a marker position (3 markers at 0 M, 0.2 M and 0.4 M). 
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Figure 7 Empirical distribution of the estimated QTL location according to the marker density under HO. Results are based on 5000 
simulations per case of in an outbred population of 300 individuals. The number of markers varied from 2 to 1 1 across along a linkage group of 
0.6 M. The line represents the density obtained with a Kernel method. 



Impact of the marker density 

Figure 9 reports the QTL location distribution for a 
marker density from 2 to 11 markers equally distributed 
on a 0.6 M interval. One QTL was simulated at 0.25 M. 
It shows the advantage of using a high marker density 
in the QTL detection: when the marker spacing was 
minimal (i.e. 6 cM), the location of the QTL was very 
accurate. Table 4 shows that, when the density 
increased, the RMSE of the estimated QTL location 
decreased. So the QTL location was more accurately 
estimated. The dependency between the power of the 
analysis and the bias extent was confirmed. On the con- 
trary, there were low variations of RMSE or proportion 
of QTL mapped at marker location according to a. 
Finally, as under HO, these tendencies were confirmed 
for the proportion of QTL located at a marker location, 
except for 11 markers. 
Impact of the QTL effect 

The empirical distributions of the estimated QTL loca- 
tion when the QTL effect increased from 0.5 phenotypic 
standard deviation (a) to 4a is shown in Figure 10. One 
QTL was simulated at 0.1 M on a 0.4 M linkage group 
with three markers equally spaced. The power of the 
QTL detection, the RMSE of the estimated QTL loca- 
tion and the proportion of estimated QTL locations at a 
marker are given in Table 5. Results indicated that, 
whenever the QTL effect increased, the bias decreased. 

Table 2 Proportion of estimated QTL locations at marker 
locations according to the marker density under HO 







Number of markers 1 






2 


3 5 7 


11 


Proportion 2 (%) 


58.6 


58.5 52.6 57.8 


69.5 



^he markers were evenly distributed on a linkage group of 0.6 M. 
Proportion of QTL located at a marker location for a population of 300 
individuals. 



As seen previously, the bias decreased when the power 
increased but the RMSE or the proportion of QTL 
mapped on a marker position were only slightly depen- 
dent on the test level. The KS test indicated that the 
distribution of the estimated QTL position was signifi- 
cantly different when the QTL effect changed (p - values 
< 2.2e - 16). The MWW test showed that, when the 
QTL effect increased, the median of the error of the 
estimated QTL position decreased (p - values < 2.2e - 
16). 

Impact of the true QTL location 

Figure 11 shows the variation in the distribution of the 
estimated QTL location when the simulated QTL posi- 
tion (A on the X axis) moved towards the middle of two 
flanking markers on a 0.4 M linkage group. Table 6 
shows the power of the QTL detection, the RMSE of 
the estimated QTL location and the proportion of esti- 
mated QTL positions at a marker position when the 
true QTL location changed from 0 M to 0.2 M. When 
the true QTL location tended towards the flanking mar- 
kers, the power went up and the proportion of QTL 
locations at a marker location increased. On the con- 
trary, the RMSE went down. The KS test confirmed the 
difference between the distributions of the estimated 
QTL location when the true QTL location varied (p - 
values < 2.2e - 16). The MWW test, which compared 
the medians of error of the estimated QTL location, 
confirmed the increase in accuracy when the true QTL 
location tended towards a marker location. 

An algorithm to adjust the location of QTL mapped on 
markers 

Analytical developments in backcross type population 
and simulation study in outbred type population 
demonstrated that the estimated position of the QTL is 
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Figure 8 Empirical distribution of the estimated QTL location according to the population size under H1 Num. Offsp indicates the 
number of offspring in the outbred population. Results are based on 5000 simulations per case. There were three markers at 0 M, 0.2 M and 0.4 
M. A indicate the true QTL location. The line represents the density obtained with a Kernel method. 



biased towards marker location under some circum- 
stances. On the other hand, the decomposition of the 
LRT according to the formula (4) allowed to identify a 
putative cause of this bias: the residual error s l in the 
LRT both under "no QTL" and the "one QTL" hypoth- 
eses. Indeed, according to the decomposition of the LRT 
in the formula (4), if the QTL is not estimated at its 
true location, two residual errors may have generated 
the bias: e\ and e 2 . When the estimated QTL position is 
at a marker location, argmax t e 2 (t) has a uniform distri- 
bution but argmaxt £i(t) is more often estimated at a 
marker location than between markers. In such a situa- 
tion, gj is very likely to play a dominant role in the bias. 
On the contrary, when the estimated QTL location is 



not at a marker location, argmax f e 1 (f) and argmax t s 2 (t) 
are unknown for a given argmax f [e 1 (i) + e 2 {t)] error. 
Under these circumstances, it is impossible to predict 
the relative influence of £i and e 2 on the bias. On the 
basis of this observation, we propose an approach to 
describe the s^t) process and, consequently, adjust the 
estimated QTL position when a QTL co localizes with 
one marker, i.e. an approach to correct the "marker 
effect" on the bias of the estimated QTL location. 

1. Obtain the vector which contains the LRT profile 
along the linkage group, calculated on the phenotypic data, 
say L 0 . L 0 is maximum at the location of the marker M. 

1. Under the "no QTL" hypothesis, simulate pheno- 
types and obtain LRT profiles until to have 1000 profiles 



Table 3 Power, RMSE of the QTL location and proportion of estimated QTL locations at marker locations according to 
the population size under HI 









Number of individuals (s x d x p) 1 






a 2 


100 (5 X 1 X 20) 


300 (5 x 2 x 30) 


800 (5 X 4 X 40) 


Power 3 (%) 


0.05 


39 


93 


100 




0.01 


15 


81 


100 


RMSE 4 (cM) 




13.9 


8.7 


4.2 




0.05 


12.2 


8.4 


4.2 




0.01 


11.4 


8.0 


4.2 


Proportion 5 (%) 




39.9 


15.2 


1.6 




0.05 


29.7 


14.0 


1.6 




0.01 


26.8 


12.9 


1.6 



1 The population structure was a mixture of full and half sib families for given numbers of sires (5), dams per sire (d), and progeny per dam (p). 

Significance level for the LRT. 

3 Power of the QTL detection at significance level a. 

4 RMSE of the estimated QTL location, the true QTL being located at 0.1 M on a 0.4 M linkage group. 
Proportion of QTL located at a marker location (3 markers at 0 M, 0.2 M and 0.4 M). 
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Figure 9 Empirical distribution of the estimated QTL location according to the marker density under HI. Results are based on 5000 
simulations per case of an outbred population of 300 individuals. The number of markers varied from 2 to 1 1 along a linkage group of 0.6 M. A 
indicates the true QTL location. The line represents the density obtained with a Kernel method. 



which have their maximum at the position of the mar- 
ker M, say {Li}i = x,...,iooo 

3. Calculate the 1 000 vectors V, = L 0 -L b j e 1,...,1000. 

4. Retain the 1000 locations where (VJi = x, 1000 is 
maximum. 

5. Obtain the adjusted position of the QTL as the 
mean of these positions: 

P = Mean{pi | p t = argmaxV;, i = I,..., 1000}. 

In order to verify the validity of this proposal, simula- 
tions were carried out in R for a backcross type design 
of 100 progeny. There were six markers equally distribu- 
ted on a 1 M linkage group. There was one QTL of 0.5a 
of effect for which two true locations were envisaged: 
0.1 M, i.e. between markers, and 0.2 M, i.e. on a marker. 

Table 4 Power, RMSE of the QTL location and proportion 
of estimated QTL locations at marker locations according 
to the marker density under HI 

Number of markers 1 





a 2 


2 


3 


5 


7 


11 


Power 3 (%) 


0.05 


62.2 


91.6 


94.4 


96.5 


97.5 




0.01 


37.2 


79.0 


85.2 


88.9 


91.8 


RMSE 4 (cM) 




16.6 


10.5 


8.8 


7.4 


5.7 




0.05 


14.9 


10.1 


8.4 


7.2 


5.6 




0.01 


14.1 


9.7 


8.0 


6.9 


5.2 


Proportion 5 (%) 




19.0 


15.4 


12.5 


11.8 


35.0 




0.05 


13.0 


14.6 


11.5 


11.5 


34.5 




0.01 


10.8 


13.7 


10.7 


10.5 


34.0 



1 The markers were evenly located in a linkage group of 0.6 M. 
Significance level for the LRT. 

3 Power of the QTL detection at significance level a in a population of 300 
progeny. 

4 RMSE of the estimated QTL location, the true QTL being located at 0.07 M on 
a 0.6 M linkage group. 

Proportion of QTL located at a marker location. 



Table 7 shows the comparison of RMSE between 
"before" and "after" adjusting the estimated QTL loca- 
tion when the estimated QTL position was on one of 
the markers in the first place. The distributions of esti- 
mated QTL location before and after the adjustment of 
the estimated QTL location are presented in Figure 12. 
The RMSE was always smaller after the position had 
been adjusted, even when the true QTL location was in 
0.2 M, i.e. on a marker location. In this example, the 
proportion of false QTL locations on the markers was 
effectively decreased by the proposed algorithm. 

Discussion 

In order to study the elements that give rise to the bias on 
the estimated QTL position, we checked whether the dis- 
tribution of the test statistic changed along the locations 
on the linkage group. More precisely, we checked if the 
significance threshold remained the same at a marker and 
at a non-marker location. Under the null hypothesis of 
"no QTL on the linkage group", the asymptotic distribu- 
tion of the LRT at a given point is well known and identi- 
cal for all locations. It will be getting closer to the central 
X 2 distribution with a degree of freedom depending on the 
number of parameters fixed under the null hypothesis 
[20], i.e. here the number of sires or dams for which a 
QTL effect was estimated. Nevertheless, the population 
size is most often not large enough to make the LRT 
reach its asymptotic distribution for all the locations on 
the linkage group. The variability of the marker informa- 
tivity along the linkage group may actually influence this 
convergence to asymptotic conditions, resulting in varia- 
bility of the LRT distributions depending on the tested 
locations. Here, the differences between the empirical dis- 
tributions of the LRT at each position along the linkage 
group were explored using a real example of an outbred 
type population. It appeared that the variability of 
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Figure 10 Empirical distribution of the estimated QTL location according to the QTL effect under H1. Results are based on 5000 
simulations per case in an outbred population of 100 progeny. There were three markers at 0 M, 0.2 M and 0.4 M. The QTL effect varied from 
0.5 crto 4a A indicates the true QTL location. The line represents the density obtained with a Kernel method. 



informativity along the linkage group did not lead to a sig- 
nificative variability of the empirical distributions of the 
nominal test statistics. This observation is not contradic- 
tive to the bias on the estimated QTL location towards the 
locations of markers but it means that the bias is due to 
the process which defines the sup of LRT on the linkage 
group. 

Some analytical results concerning the bias of the esti- 
mated QTL location were obtained in a backcross type 
population, i.e. a backcross between inbred lines. We 
identified a common source of bias under the "no QTL" 
and the "one QTL" hypotheses, and also showed the 
possible influence of several parameters under "one 
QTL" hypothesis, such as the population size, the mar- 
ker density, the QTL effect and the true QTL location. 
Using simulations, we verified the existence of a bias on 

Table 5 Power, RMSE of the QTL location and proportion 
of estimated QTL locations at marker locations according 
to the QTL effect under HI 











QTL effect 








a 1 


0.5 a 


1 a 


1.5 a 


2 a 


4 a 


Power 2 (%) 


0.05 


10.5 


41.8 


81.0 


91 A 


100 




0.01 


2.9 


19.5 


58.1 


89.8 


100 


RMSE 3 (cM) 




17.0 


13.6 


10.3 


8.0 


4.4 




0.05 


15.6 


11.9 


9.8 


7.8 


4.4 




0.01 


15.4 


10.9 


9.1 


7.6 


4.4 


Proportion 4 {%) 




56.6 


37.9 


24.6 


13.6 


2.4 




0.05 


37.1 


28.8 


22.2 


13.2 


2.4 




0.01 


31.0 


24.6 


20.3 


12.9 


2.4 



Significance level for the LRT. 

2 Power of the QTL detection for significance level a. 

3 RMSE of the estimated QTL location in a population of 100 progeny with a 
true QTL located at 0.1 M. 

Proportion of QTL located at a marker location (3 markers at 0 M, 0.2 M and 
0.4 M). 



the estimation of the QTL location using the interval 
mapping method, under the null and the alternative 
hypotheses, when family structure are more complex 
than the backcross design considered by Walling et al. 
[4]. Simulations of outbred populations confirmed that 
this bias is influenced by the size of the population and 
the density of the genetic map, as well as by the QTL 
effect under the alternative hypothesis. We also demon- 
strated that the true QTL location, relatively to the 
flanking markers, had a significant impact on the accu- 
racy of the estimated QTL location. Moreover, we quan- 
tified the bias of the estimated QTL location for various 
values of these parameters and validated the results by 
applying appropriate test statistics. 

We showed that the population size does not affect 
the estimation of the QTL location under the null 
hypothesis. Under the alternative hypothesis, very simi- 
lar values of RMSE or of proportion of QTL detected at 
marker locations were observed whatever a. On the 
other hand, a slight reduction in the bias seemed to be 
obtained when applying a < 0.01. However, the choice 
of a high significance level also implies a decrease of 
power and the detection of only few QTL. As a conse- 
quence, it cannot be considered an efficient way to cor- 
rect the bias problem. 

Considering these results leads to a first recommenda- 
tion which would be that the number of animals and/or 
markers must be adjusted to the desired test power and 
location accuracy. Figure 13 summarizes the variability 
of the bias in accordance with the population size and 
the QTL effect. The proportion of QTL mapped at a 
marker location and the RMSE of the QTL location pre- 
sent the same tendencies according to the variations in 
QTL effect and in population size. This confirms that 
the bias on the estimated QTL location is essentially 
due to the location of QTL on markers. 
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Figure 11 Empirical distribution of the estimated QTL location according to the true QTL location under H1. Results are based on 5000 
simulations per case of an outbred population of 300 progeny. There were two flanking markers at 0 M and 0.4 M. The true QTL location varied 
from 0 M to 0.2 M. A indicates the true QTL location. The line represents the density obtained with a Kernel method. 



Secondly, concerning the particular case of eQTL 
detection, when the marker information is relatively 
sparse, for example when microsatellite markers are 
used for genotyping, it is necessary to measure several 
hundred of animals for transcriptomic data to obtain an 
accurate eQTL location. Finally, a population size of 300 
progeny seems to be a good compromise in the detec- 
tion of eQTL, even if only those which have relatively 
large effects will be detected. 

Thirdly, it is clear that significant QTL detection 
located at a marker position should be considered with 
caution, especially when the population size, the marker 
density or the QTL effect are low. Hence, the approach 
proposed above is efficient to remedy the bias on the 
estimated QTL location in such situation. 



Table 6 Power, RMSE of the QTL location and proportion 
of estimated QTL locations at marker locations according 
to the true QTL location under HI 









QTL location (M) 






a 1 


0 


0.05 


0.1 


0.15 


0.2 


Power 2 (%) 


0.05 


96.1 


92.3 


87.2 


82.0 


81.2 




0.01 


87.4 


80.1 


69.7 


61.9 


60.2 


RMSE 3 (cM) 




7.5 


7.6 


9.5 


10.7 


11.2 




0.05 


7.2 


7.2 


9.0 


10.1 


10.7 




0.01 


7.0 


7.0 


8.6 


9.7 


10.3 


Proportion 4 (%) 




51.5 


38.6 


26.6 


19.0 


16.2 




0.05 


51.3 


37.8 


24.8 


15.8 


13.6 




0.01 


50.9 


37.2 


24.3 


13.8 


11.2 



Significance level for the LRT. 

2 Power of the QTL detection for significance level a. 

3 RMSE of the estimated QTL location in a population of 300 progeny with a 
true QTL located at 0.1 M on a 0.4 M linkage group. 

Proportion of QTL located at a marker location (2 markers located at 0 M and 
0.4 M). 



Conclusions 

When we apply the interval mapping method on an 
outbred design to map QTL, the QTL is often incorrectly 
mapped at the position of a marker. In this work, this bias 
was studied by using analytical developments in backcross 
type population and simulated data in outbred popula- 
tions. In the absence of QTL, adjusting the thresholds at 
the location of markers cannot reduce the bias, and the 
population size does not affect the bias. Under the hypoth- 
esis of having one QTL, the impact of some parameters on 
the bias was confirmed: when the population size and/or 
the QTL effect and/or the marker density are large 
enough, the bias is reduced. Moreover, the closer the QTL 
is to a marker location, the more accurate the estimation 
is. Therefore, caution should be taken when the QTL is 
mapped at a position of a marker, in particular for low 
power designs. In such cases, a method is proposed to cor- 
rect the bias on the estimated QTL location. Simulations 
carried out in a backcross type population demonstrated 
that this method is valid to limit the bias. 

Methods 

Analyses on a real data set in pig 

A real data set was used to illustrate some aspects of the 
present work. It is a porcine outbred population of 325 
progeny issued from 4 sires. One example of eQTL ana- 
lysis using the QTLMap software [21], i.e. the analysis of 
the chromosome SSC18 for 6 665 gene expression traits, 
was given. The linkage analysis method was applied 
according to Le Roy et al. [22] with a gene by gene proce- 
dure. For each gene, when the LRT was significant at the 
5% 0 level at the chromosome level, the estimated eQTL 
location was the location where the LRT was maximum 
on the linkage group. 

The same familial structure was used to study the 
empirical distribution of the LRT along the linkage 
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Table 7 RMSE of the QTL location before or after 
adjustment 





True QTL location (M) 




0.1 0.2 


Before 1 


28.6 23.6 


After 2 


26.6 21.3 



Results based on 5000 simulations per case in a backcross population of 100 
progeny. There were 6 markers equally distributed on 1 M. 
1 RMSE for the estimated QTL location. 

2 RMSE for the estimated QTL location after adjusting the estimated QTL 
location by the proposed algorithm. 

group. Two thousand simulations were performed under 
the null hypothesis of "no QTL" on the chromosome 
SSC1 which carried 16 microsatellite markers. A poly- 
genic heritability coefficient of 0.5 was assumed for the 
trait (see http://www.inra.fr/qtlmap). 

Simulations of an Ornstein-Uhlenbeck process 

The asymptotic distribution of the LRT process at mar- 
ker positions was shown as being the square of an OU 
process [16]. Let's X t denote the value of this OU pro- 
cess at the t location. X t could be described as: 

dX t = 2X t dt + 2dW t 

where W t denotes the Brownien movement. 

In a backcross type population, the mean of this pro- 
cess is 0 and the autocovariance is: cov(X e X t ) = e" 2 '"' 
with t and f in the Haldane distance unit [16]. 

To simulate this process, we considered a linkage 
group with mk markers. We generated mk independent 



random numbers z 0 , Zj, z mk from a normal distribu- 
tion with mean 0 and variance 1 with the function rnor- 
min R. We defined X 0 = z 0 . Then, a discrete analog of 
the OU process [23] was given by: 



x s = <r 2l x s _ 1+ yi-e- 4T z s 

with s = 1, mk, where r denotes the spacing of two 
adjacent markers in Morgan. This sequence is a first- 
order autoregressive sequence. 

Simulations of outbred type population 

The QTLMap software [21] was used to simulate and 
analyse the data sets. QTLMap allowed the simulation 
of complete experimental designs with pedigree, genetic 
map, genotypes and phenotypes http://www.inra.fr/ 
qtlmap. The population structure was a mixture of full 
and half sib families for given numbers of sires (s), of 
dams per sire (d) and of progeny per dam (p). Most 
often, 3 markers were equally distributed on a 0.4 M 
linkage group. Each marker had 6 alleles with equal fre- 
quencies in the parental population. The QTL was 
simulated at 0.1 M and all sires and dams were hetero- 
zygous for the QTL. The phenotypes of the progeny 
were simulated as follows: 



agijk{t 0 ) + e t jk 



(5) 



jijk is the phenotype of the progeny ijk of the sire i 
and of the dam ij. «, and denote the polygenic 
effects, of the sire i and of the dam ij respectively, which 
follow a normal distribution with mean 0 and variance 



Before 
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Figure 12 Empirical distribution of the estimated QTL locations before and after adjustment Results are based on 5000 simulations per 
case in a backcross population of 100 progeny. There were six markers equally distributed on a linkage group of 1 M. The true QTL location 
indicated as A was set at 0.1 M or 0.2 M. 



Wang et al. BMC Genetics 2012, 13:29 
http://www.biomedcentral.eom/1 471 -2 1 56/1 3/29 



Page 13 of 16 




n 1 1 r ° n 1 1 r 

0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 



QTL effect QTL effect 

Figure 13 Distribution of the bias depending on the population size and the QTL effect a. Proportion of estimated QTL locations on 
marker positions, b. RMSE of QTL location. Results are based on 5000 simulations per case. There were three markers at 0 M, 0.2 M and 0.4 M. 
The true QTL location was set at 0.1 M. 5*1*20 indicates that there were 5 sires, 1 dam per sire and 20 progeny per dam in the design. 



. a denotes the QTL allelic substitution effect and 
(t 0 ) is the genotypic value of ijk at the QTL location t 0 . 
gijk takes value 1, 0 or -1 depending onthe QTL geno- 
type, QQ, Qq or qq, respectively, is a random normal 
variable with mean 0 and variance er 2 . The variance 
within QTL genotype is a 1 = 2er 2 + er 2 and a is 
expressed in a unit. The heritability coefficient, equal to 
4er 2 /cr 2 , was fixed at 0.25. 

For each of the cases studied, the results were based 
on 5 000 simulations, either under the null hypothesis 
(H 0 : there is no QTL segregating on the linkage group, 
i.e. a = 0) or under the alternative hypothesis (H 0 : there 
is one QTL segregating on the linkage group, i.e. most 
often a = la). For each simulated dataset, the estimated 
QTL position was the location of the linkage group 
where the LRT was maximum. 

Under the null hypothesis, simulations were carried 
out so as to compare the influence of the population 
size with 6 levels: 60 (3s,ld,20p), 80 (4?s,ld,20p), 100 
(5s,ld,20p), 300 (5s,2d,30p), 400 (5s,2d,40p), 800 
(5s,4<i,40/>) progeny. Under the HI hypothesis, only 3 of 
these population sizes were considered: 100, 300 and 
800 progeny. 

To understand how the QTL effect affects the estima- 
tion of the QTL location, a population of 100 progeny was 
simulated with a QTL effect ranging from 0.5 a to 4 o. 

Other simulations were performed in a population of 
300 progeny. Firstly, to check the bias extent depending 
on the marker density, samples with 2, 3, 5, 7 or 11 
markers equidistant in a linkage group of 0.6 M were 
simulated, under the null and under the alternative 
hypotheses. Under H lt one QTL was simulated at 0.25 
M (a = la). Secondly, to test how the true QTL location 
may affect the bias, we performed simulations under H± 
with a QTL (a = la) lying at 0 M, 0.05 M, 0.10 M, 0.15 



M, 0.2 M on a linkage group of 0.4 M with two flanking 
markers at 0 M and 0.4 M. 

Criteria 

In each of the cases studied, the empirical distribution 
of the estimated QTL location was obtained from the 
5000 locations of maximum of LRT of the simulations. 
The proportion of simulations for which the QTL loca- 
tion was estimated at one marker position was retained 
to quantify the variability of the bias depending on the 
parameters. Under the alternative hypothesis, beside the 
proportion of the QTL which co-localized with a mar- 
ker, the root mean squared error (RMSE) of the QTL 
location was chosen to describe the bias variability 
which depends on the parameters. What is more, the 
power of the QTL detection was calculated for a first 
type error a = 0.01 and 0.05. For all simulations and for 
simulations with a significant QTL at the level a = 0.01 
or 0.05, the proportion of QTL that were estimated at a 
marker location and the RMSE of the estimated QTL 
location were computed. The RMSE computation was 
given by the following formula 



RMSE 



1 L 



-to) 



where L is the number of simulations (all, significant 
at the level 0.05 or at the level 0.01), % is the I th esti- 
mated QTL position and t 0 is the true QTL position. 

Hypothesis test 

Appropriate statistical tests are needed to evaluate 
which parameters affect the bias of the estimated QTL 
position. ANOVA was not adequate to test the equality 
of the average QTL position in two different conditions 



Wang et at. BMC Genetics 2012, 13:29 
http://www.biomedcentral.eom/1 471 -2 1 56/1 3/29 



Page 14 of 16 



(e.g. 2 population sizes) because of the non normality of 
the QTL position estimator. Therefore, two nonpara- 
metric tests were combined in order to test which para- 
meters affect the bias, and how they influence the 
variation of the QTL location estimation. This was per- 
formed in two steps: (1) the parameters which influence 
the accuracy of the estimated QTL location were identi- 
fied. This step was carried out with a Kolmogorov-Smir- 
nov test; (2) for the parameters identified in the first 
step, a description of their effect on the accuracy of the 
estimated QTL position was made. This step was per- 
formed with a Mann- Whitney- Wilcoxon test [24]. 

1. Kolmogorov-Smirnov test (KS): this test was applied 
in order to check whether a parameter affected the esti- 
mation of the QTL position. For each value of the para- 
meter, an empirical distribution of the estimated QTL 
location was obtained using 5 000 simulations. The two 
hypotheses compared by the KS test were: 

HO : F a =F b HI: F a JF h , 

where F m F b denote the distribution of the estimated 
QTL position under the conditions a and b, respectively. 
For a given parameter, all the distributions were com- 
pared by pairs with the function ks.test in R. If all pair 
comparisons concluded to accept the null hypothesis, it 
means that the value of this parameter did not influence 
the estimation of the QTL position. 

Mann- Whitney- Wilcoxon test (MWW): when the null 
hypothesis in the first step was rejected, the MWW test 
was used to understand how the parameter affected the 
estimation with the function wilcox.test in R. The 
hypotheses compared were: 

HO : D a = D b HI : D a < D b , 

where D b denote the absolute values of the devia- 
tions between the estimated QTL position and the 
assumed, i.e. the true position, under the condition a 
and b, respectively. A smaller median D corresponds to 
a more accurate position estimation. 

Appendix 

Appendix I 

Let us denote M k the genotype of the markers at 0 and 
T for individual k and p kt = ? (g k (t) = \\M k ). Then using 
a linearized likelihood function instead of the mixture of 
two normal distributions, the LRT can be written as: 
C(ji,...,y n ) 



LRT(t) 



-2 In 



" 2ln - r, 

msx a C{yi,...,y n ;a) 

max fl n</>(yfc + f(l -2p fet );0, 1) 

[Eyfe(i-2fa)] 2 

£(l-2p, it , 2 ' 



Pkt 


Probability 


M k 


O-0 t )(i-0 r _ t )/(iA) 


(1-6V)/2 






e T /2 


M- l M ] M 2 m 2 


8 T - t 9/8 T 


0-e T )/2 
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8 T /2 
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M k 


(1-0 r 07, t )/(1-0r) 
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M ] M ] M 2 M 2 


i8 T - t -8 t )/e T 


0 T /2 


MiMiM 2 m 2 
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M ] m ] M 2 m 2 


(e f 0 T -i/e T 


6*7-/2 


M^m^M 2 M 2 



where (p^x;fi,a) = - 1 g~ " 2ct^ an d tne distribution 

of p/a is given as: 

Note that x k (t) = E(g k (t)\M k ) = 2p kt - 1, so the LRT at 
each position t can be described as 

[£^(t)] 2 



LRT{t) = 
each individual k is 



and the distribution of x k (t) for 



Appendix II 

Replacing yn = f Xfe(trj) + efe in LRT (3), we have 



i 2 H>*(toH(t)i 2 Ej^W] 
4 fl E4W 



1. 1') 



= f(t) + El(t) + E 2 (t), 

where in the case of large sample, we have 

f{t)/n —> \a 2 Var{x(to))Pt ta when n — > °° according to 
the law of large numbers. 

• e\[t) = [T d iX ^ t )} 1 is the LRT under the no QTL 

hypothesis. 



Cov{x{t),x{t 0 )) 



Var(x{t)) ^2 e k x k{t) 

is a residual error, linear combination of gaussian ran- 
dom variables. Its distribution is approximated as 
N{0,na 2 Var{x(t))p? to ). 

Appendix III 

Considering the two first terms in the expression of 
^LRT(t) (4), when n tends to infinity, -ei(t) will con- 
verge to 0 at each position t. So the amplitude of the 
curve representing the term ^ei(t), with respect to that 
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of the first term, is reduced. In the same way, as a 
increases, the amplitude of the first term becomes larger 
with respect to that of the second term. 

The proof of the influence of t 0 and T will use the 
results in this following lemma: 

Lemma 1. Given two markers at the location 0 and Tin 
a linkage group of length T and assuming a QTL located 
at to, from the distribution of Var(x(t)) and applying the 
Taylor series expansion in case of small T, we have: 

2. a*m<*Y) = + fc^^ai w 4 tt0 _ 2(t + fo) + 1 

Now let us assume T < 1 and consider the peak-to- 
peak amplitude of the first term of ^LRT(t) , 

#(0 = 5 a2 v ' flr ( ;)C ( t o))p t 2 tci » i-e-» the deviation between 
highest amplitude value and lowest amplitude value: 

S = max ?(f) — min v(t). 

te[0,T] te[0,T] 

If £ 0 6 [j,T], then the max £is [o, rigM is reached at t = 
t 0 and the min te [0 rig(t) is reached at 0. Hence, we have: 



8 = — [Var[x{t 0 )) - Cov 2 {x[0),x(t 0 ))] 
*a 2 {±-l)t 2 0 . 

It can be seen that when t 0 — > T from j and/ or when 
T decreases, 5 will become larger. 

If to e [0, 1 ] » then the max te [0j T]g{t) is reached at t = 
t 0 and the min te [0j r|^(t) is reached at T. Hence, we have: 

a 2 

S = — [Var(x{t Q )) - Cov 2 {x(T), x{t Q ))] 
^a 2 (^-l)(t 0 -T) 2 . 

It can be seen that when t 0 — > 0 from j, 8 will 
become larger. Likewise, we set a constant c for the dis- 
tance between the argmax te [ 0j r\g(t) and argmin te [0i j^g 
(£). Then, when T changes, the position of QTL is t 0 = 
T -c and 

S « fl 2(I - 1) C 2. 

Therefore, when T becomes larger, 8 decreases. 

In conclusion, the amplitude of g(t) will be greater as 
the QTL position tends to one marker and/or the dis- 
tance between the markers decreases. 
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